Add immutable LIRS Cache implementation #155

jcoveney · 2013-10-10T23:17:49Z

This fixes #97

…cache

ryanlecompte · 2013-10-11T00:53:08Z

storehaus-cache/src/main/scala/com/twitter/storehaus/cache/LIRSCache.scala

+    new Stack[K](maxSize, backingIndexMap, backingKeyMap)
+}
+
+sealed class Stack[K](maxSize:Int, backingIndexMap:SortedMap[Int, K], backingKeyMap:Map[K, Int]) {


Minor, but did you mean to say final here instead of sealed? I don't see any subclasses of Stack in this file. I believe it's the same case for LIRSStacks.

I misunderstood what sealed does. What's the preferred way to make a class private? Just to make it final? Make the constructor private?

Sealed is usually used when you have a fixed number of implementations, and they must be defined within the same source file. In this case maybe you'd want to just say private[storehaus]. I'm not sure what the other caches do, but at least you're signaling that it's private to the storehaus package and isn't meant for public consumption.

Oscar didn't love all the privates. He suggested tossing it all in a private class if I really care, otherwise just leaving it. What do you think is the best way? I just don't want people to depend on internals.

Actually, it's funny that you mention that. Today I saw a pull request come
in for Scala's new pickling support that attempted to make classes private
that were not intended for external consumption. The initial approach was
to mark things as "private[pickling]", but someone commented that you can
actually still get access to them if you do something like "import
pickling._" (there's some esoteric bug that was linked to). So, instead he
suggested putting these private classes in an "internal" package like this:
https://github.com/scala/async/tree/master/src/main/scala/scala/async/internal

On Fri, Oct 11, 2013 at 2:16 PM, Jonathan Coveney
[email protected]:

In
storehaus-cache/src/main/scala/com/twitter/storehaus/cache/LIRSCache.scala:

* limitations under the License.

*/
+
+package com.twitter.storehaus.cache
+
+import scala.collection.SortedMap
+import scala.annotation.tailrec
+
+object Stack {

def apply[K](maxSize:Int,

backingIndexMap:SortedMap[Int, K] = SortedMap.empty[Int, K],

backingKeyMap:Map[K, Int] = Map.empty[K, Int]) =

new Stack[K](maxSize, backingIndexMap, backingKeyMap)
+}

+sealed class Stack[K](maxSize:Int, backingIndexMap:SortedMap[Int, K], backingKeyMap:Map[K, Int]) {

Oscar didn't love all the privates. He suggested tossing it all in a
private class if I really care, otherwise just leaving it. What do you
think is the best way? I just don't want people to depend on internals.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/155/files#r6927010
.

…out bound

ryanlecompte · 2013-10-12T00:06:19Z

Hey @jcoveney, so I took a look at the CyclicIncrementer. I think it's a cool implementation, although it definitely adds some complexity to the code. I just took some time to step back and figure out exactly what we want here. I think that we essentially want a bounded stack that's mapped by an efficient array-based data structure as well as a map. I took a stab at implementing a BoundedStack that doesn't rely on ever-increasing integers. It's backed by a Vector and a Map, and is fully immutable. I performed a few tests in the REPL and it seems to do the trick. It implements the same API as your existing Stack. Do you think this could satisfy our needs? It might be a nice general utility to add to storehaus, and perhaps we can also use it for LRUCache. If you like it, I can submit a pull req that includes it:

object BoundedStack {
  def apply[A](maxSize: Int): BoundedStack[A] =
    new BoundedStack[A](maxSize, Vector.empty[A], Map.empty[A, Int])
}

class BoundedStack[A](
  maxSize: Int,
  backingStack: Vector[A] = Vector.empty,
  backingIndexMap: Map[A, Int] = Map.empty) {

  def putOnTop(elem: A): (Option[A], BoundedStack[A]) = {
    backingIndexMap.get(elem) match {
      case Some(idx) =>
        val newBackingStack = dropFromStack(idx) :+ elem
        val newIdx = newBackingStack.size - 1
        val newBackingIndexMap = backingIndexMap + (elem -> newIdx)
        (None, new BoundedStack(maxSize, newBackingStack, newBackingIndexMap))
      case None =>
        if (isFull) {
          val oldElem = backingStack.head
          val newBackingStack = dropFromStack(0) :+ elem
          val newIdx = newBackingStack.size - 1
          val newBackingIndexMap = backingIndexMap - oldElem + (elem -> newIdx)
          (Some(oldElem), new BoundedStack(maxSize, newBackingStack, newBackingIndexMap))
        } else {
          val newBackingStack = backingStack :+ elem
          val newIdx = newBackingStack.size - 1
          val newBackingIndexMap = backingIndexMap + (elem -> newIdx)
          (None, new BoundedStack(maxSize, newBackingStack, newBackingIndexMap))
        }
    }
  }

  def dropOldest: (Option[A], BoundedStack[A]) = {
    backingStack.headOption match {
      case e @ Some(elem) =>
        val newBackingStack = dropFromStack(0)
        val newBackingIndexMap = backingIndexMap - elem
        (e, new BoundedStack(maxSize, newBackingStack, newBackingIndexMap))
      case None =>
        (None, this)
    }
  }

  def remove(elem: A): (Option[A], BoundedStack[A]) = {
    backingIndexMap.get(elem) match {
      case Some(idx) =>
        val newBackingStack = dropFromStack(idx)
        val newBackingIndexMap = backingIndexMap - elem
        (Some(elem), new BoundedStack(maxSize, newBackingStack, newBackingIndexMap))
      case None =>
        (None, this)
    }
  }

  def contains(elem: A): Boolean = backingIndexMap.contains(elem)

  def isOldest(elem: A): Boolean = backingStack.headOption.exists { _ == elem }

  def size: Int = backingStack.size

  def isFull: Boolean = size >= maxSize

  def isEmpty: Boolean = !isFull

  def empty: BoundedStack[A] = new BoundedStack[A](maxSize, Vector.empty, Map.empty)

  override def toString = backingStack.mkString(",")

  private def dropFromStack(idx: Int): Vector[A] = backingStack.patch(idx, Seq.empty, 1)
}

ryanlecompte · 2013-10-12T00:09:02Z

Also, Vector is effectively constant for appends: http://docs.scala-lang.org/overviews/collections/performance-characteristics.html

jcoveney · 2013-10-12T00:24:44Z

Hey, I really like this idea. I actually originally implemented it with a Vector as well, as they are a sweet, underused data type. The one thing I'm not sure about is how patch is implemented. It seems like vectors are optimized for efficient operations on the ends, but I'm not sure how they do random removals. If that is efficient, then it is definitely a great data structure to use. I just need to look at the implementation of patch. Do you know if they can drop things efficiently? Because if not, that is the big loss here.

ryanlecompte · 2013-10-12T05:31:58Z

So, I did a bit of digging into this and came across this thread: https://groups.google.com/forum/#!topic/scala-user/fZ1TTNgneW4

It looks like some people suggest using patch, and others came up with less visually appealing solutions that may perform better than patch. I performed a quick benchmark using Thyme for a Vector with 100K elements, and removing an item at an arbitrary index using patch is pretty painfully slow because it ends up having to reshuffle/rebuild the Vector from scratch it seems (unlike Vector#updated which is very fast but only allows replacing an element, not removing it).

However, I wonder if we are just being overly paranoid in general? I think we could probably safely just convert your original implementation to use a Long instead of an Int. Let's think about it:

Let's assume that a LARSCache gets accessed roughly 100K times a second (very aggressive for a single machine). And, let's assume that each access causes the index to increment. If the cache consistently gets hit 100K times a second for an entire year, that's 3153600000000 (100K * # of seconds in a year = 100K * 31536000). Long.MaxValue is 9223372036854775807. So, at that aggressive rate of 100K hits/s you would end up with Long.MaxValue / 3153600000000 = 2924712 years before it overflows!

I am leaning towards switching to Long in your original implementation and avoiding the complications added by the CyclicIncrementer. Thoughts?

johnynek · 2013-10-13T21:04:20Z

How about we split the difference:

trait IdProvider[T] {
  def next: (T, IdProvider[T])
  def cull(t: T): IdProvider[T]
}

case class LongProvider(next: Long) extends IdProvider[Long] {
  def next = (next, LongProvider(next + 1L))
  def cull(l: Long) = this
}

object IdProvider {
  implicit def longCounter: IdProvider[Long] = LongProvider(0L)
}

Then implement this typeclass (which you have already done) for CyclicIncrement[T] so we have:

object CyclicIncrement {
  implicit def idProvider[T](implicit anyNeededHere, or make in just T == Int): IdProvider[CyclicIncrement[T]] = // wrap your CyclicProvider
}

Lastly, Stack can be made to use a generic provider with the typeclass pattern, and Ryan can inject the LongProvider by using the Id type == Long, and you can use Cyclic with Id = CyclicIncrement[Int]

ryanlecompte · 2013-10-14T00:05:13Z

That's a really nice way to split the difference and a sweet use of the typeclass pattern. I really wish Scala had a scala.collection.immutable.LinkedHashMap. Unfortunately there's just a mutable one, but that's pretty much exactly what we want here (i.e., a bounded LinkedHashMap). Here's a quick version of the stack using a mutable LinkedHashMap. Certainly simplifies things, but would require synchronization of course and only really makes sense for a mutable version of a LIRS cache. Would actually be interesting to see the performance differences between a thread-safe mutable LIRS cache vs. the immutable one:

import scala.collection.mutable

object BoundedStack {
  def apply[A](maxSize: Int): BoundedStack[A] = new BoundedStack[A](maxSize)
}

class BoundedStack[A](maxSize: Int) {
  private val backingMap = mutable.LinkedHashMap.empty[A, A]

  def putOnTop(elem: A): Option[A] = {
    backingMap.get(elem) match {
      case Some(_) =>
        backingMap -= elem
        insert(elem)
        None
      case None =>
        if (isFull) {
          val oldElem = dropOldest
          insert(elem)
          oldElem
        } else {
          insert(elem)
          None
        }
    }
  }

  def dropOldest: Option[A] = {
    backingMap.headOption match {
      case Some((elem, _)) =>
        backingMap -= elem
        Some(elem)
      case None =>
        None
    }
  }

  def remove(elem: A): Option[A] = {
    backingMap.get(elem) match {
      case s @ Some(_) =>
        backingMap -= elem
        s
      case None =>
        None
    }
  }

  def contains(elem: A): Boolean = backingMap.contains(elem)

  def isOldest(elem: A): Boolean = backingMap.keys.headOption.exists { _ == elem }

  def size: Int = backingMap.size

  def isFull: Boolean = size >= maxSize

  def isEmpty: Boolean = !isFull

  def empty: BoundedStack[A] = new BoundedStack[A](maxSize)

  override def toString = backingMap.keys.mkString(",")

  private def insert(elem: A) { backingMap += elem -> elem }
}

jcoveney · 2013-10-14T17:06:55Z

Another data structure that could work is the one that clojure uses, the priority map: https://github.com/clojure/data.priority-map

jcoveney · 2013-10-14T17:08:06Z

As far as my thoughts: I agree that going with the trait is the way to go, good suggestion oscar. That said, I don't think we should go with the vector approach. And yeah, I'd be curious to compare the immutable implementation with the mutable one...though the wins from making it immutable, of course, transcend performance concerns.

ryanlecompte · 2013-10-14T17:10:10Z

Indeed. :) Agree with everything!

johnynek · 2013-10-14T21:11:46Z

Looks like we are converging here. Nice. One thought: rather than IdProvider[T] what about Clock[T] basically it gives you a now and a tick to go to the next "now". MutableClock[T] would be one that tick returns unit, and now returns the latest time (like a wrapper System.currentTimeMillis). Lastly note that Snowflake, twitter's unique ID generator is a kind of MutableClock.

Just an idea. Maybe IdProvider is more to the point, but I liked the analogy to a kind of abstract Time.

jcoveney · 2013-11-05T01:45:02Z

I incorporated the clock idea, added some tests, and I think it's good to go. If someone (maybe me!) implements a priority map, we can use that in these cache implementations (and in scala period, hah) and it'd be really cool.

ryanlecompte · 2013-11-05T01:52:40Z

Nice! I like it. Great work. Immutable data structures are so much fun. I had fun writing an immutable patricia trie (prefix map) over the weekend: https://gist.github.com/ryanlecompte/7287415
immutability FTW!

johnynek · 2013-11-08T19:46:05Z

storehaus-cache/src/main/scala/com/twitter/storehaus/cache/Clock.scala

+}
+
+trait IdProvider[@specialized(Int, Long) T] extends Clock[T, IdProvider[T]] {
+  def cull(oldId: T): IdProvider[T]


comment on the meaning here. (A bit like "free" right? in a malloc sense).

jcoveney · 2013-11-13T22:49:06Z

Note that this cannot be included until a new Algebird is pushed b/c it uses Successible.

jcoveney · 2013-12-11T19:45:01Z

With the new algebird release, this now passes and can be merged :)

johnynek · 2013-12-11T22:31:42Z

storehaus-cache/src/main/scala/com/twitter/storehaus/cache/CyclicIncrement.scala

+object CyclicIncrementProvider {
+  def intIncrementer: CyclicIncrementProvider[Int] = CyclicIncrementProvider(0)
+
+  def apply[@specialized(Int, Long) K: Ordering: Successible](zero: K): CyclicIncrementProvider[K] =


doesn't Successible have an ordering? You could weaken this to Successible and just get the ordering from there, right?

Add immutable LIRS Cache implementation

johnynek · 2013-12-16T22:13:21Z

By the way, it would be nice to have some benchmarking code for cache hit-rate for a variety of distributions on key space (e.g. probability of seeing key i is ~ 1/i for N keys).

Jonathan Coveney added 2 commits October 10, 2013 15:01

Add preliminary implimination of LIRS Cache, based on Clojure's core.…

55e7462

…cache

Refactor and fix

e6ce1ce

ryanlecompte reviewed Oct 11, 2013
View reviewed changes

Jonathan Coveney added 12 commits October 10, 2013 18:21

Make helper class private

d23dddf

Create a CyclicIncrement which allows us not to grow our counter with…

c7f2b8e

…out bound

Remove unnecessary parameter in CyclicIncrement

7842ef3

Remove unnecessary parameter in CyclicIncrement

4a25789

Remove private[storehaus], fix construction bug

c339c9c

Add license header

89f28b6

Minor tweak

081de8e

Make nextSide method a val

c422848

Move around classes

99dea48

Comments for CyclicIncrement

4df8f1a

Specialize the CyclicIncrement

8646317

Get rid of the unecessary reference in the CyclicIncrement to the side

cde9d6a

Jonathan Coveney added 4 commits November 4, 2013 13:38

Add tests, add Clock type

bc66918

remove irrelevant comment

0252557

comment cleanup

cd000fe

cleanup whitespace

2db108f

johnynek reviewed Nov 8, 2013
View reviewed changes

Jonathan Coveney added 2 commits November 11, 2013 14:15

Take into account Oscar's comments

2d4cd0b

Use Successible

52f9b37

Merge remote-tracking branch 'upstream/develop' into jco/lirscache

a965167

johnynek reviewed Dec 11, 2013
View reviewed changes

more cleaning

036875a

johnynek added a commit that referenced this pull request Dec 16, 2013

Merge pull request #155 from jcoveney/jco/lirscache

fa32ddb

Add immutable LIRS Cache implementation

johnynek merged commit fa32ddb into twitter:develop Dec 16, 2013

jcoveney deleted the jco/lirscache branch July 15, 2014 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add immutable LIRS Cache implementation #155

Add immutable LIRS Cache implementation #155

jcoveney commented Oct 10, 2013

ryanlecompte Oct 11, 2013

jcoveney Oct 11, 2013

ryanlecompte Oct 11, 2013

jcoveney Oct 11, 2013

ryanlecompte Oct 11, 2013

ryanlecompte commented Oct 12, 2013

ryanlecompte commented Oct 12, 2013

jcoveney commented Oct 12, 2013

ryanlecompte commented Oct 12, 2013

johnynek commented Oct 13, 2013

ryanlecompte commented Oct 14, 2013

jcoveney commented Oct 14, 2013

jcoveney commented Oct 14, 2013

ryanlecompte commented Oct 14, 2013

johnynek commented Oct 14, 2013

jcoveney commented Nov 5, 2013

ryanlecompte commented Nov 5, 2013

johnynek Nov 8, 2013

jcoveney commented Nov 13, 2013

jcoveney commented Dec 11, 2013

johnynek Dec 11, 2013

johnynek commented Dec 16, 2013

Add immutable LIRS Cache implementation #155

Add immutable LIRS Cache implementation #155

Conversation

jcoveney commented Oct 10, 2013

ryanlecompte Oct 11, 2013

Choose a reason for hiding this comment

jcoveney Oct 11, 2013

Choose a reason for hiding this comment

ryanlecompte Oct 11, 2013

Choose a reason for hiding this comment

jcoveney Oct 11, 2013

Choose a reason for hiding this comment

ryanlecompte Oct 11, 2013

Choose a reason for hiding this comment

ryanlecompte commented Oct 12, 2013

ryanlecompte commented Oct 12, 2013

jcoveney commented Oct 12, 2013

ryanlecompte commented Oct 12, 2013

johnynek commented Oct 13, 2013

ryanlecompte commented Oct 14, 2013

jcoveney commented Oct 14, 2013

jcoveney commented Oct 14, 2013

ryanlecompte commented Oct 14, 2013

johnynek commented Oct 14, 2013

jcoveney commented Nov 5, 2013

ryanlecompte commented Nov 5, 2013

johnynek Nov 8, 2013

Choose a reason for hiding this comment

jcoveney commented Nov 13, 2013

jcoveney commented Dec 11, 2013

johnynek Dec 11, 2013

Choose a reason for hiding this comment

johnynek commented Dec 16, 2013